ShARe/CLEF eHealth 2013 Named Entity Recognition and Normalization of Disorders Challenge

نویسندگان

  • Jon D. Patrick
  • Leila Safari
  • Ying Ou
چکیده

Objective: There are abundant mentions of clinical conditions, anatomical sites, medications and procedures in clinical documents. This paper describes use of a cascade of machine learners to automatically extract mentions of named entities about disorders from clinical notes. Tasks: A Conditional Random Field (CRF) machine learner has been used for named entity recognition and to capture more complex (multiple word) named entities we have used Support Vector Machines (SVM). Firstly, the training data was converted to the CRF format. Different feature sets were applied using 10-fold cross validation to find the best feature set for the machine learning model. Secondly, the identified named entities were passed to the SVM to find any relation among the identified disorder mentions to decide whether they are a part of a complex disorder. Approach: Our approach was based on a novel supervised learning model which incorporates two machine learning algorithms (CRF and SVM). Evaluation of each step included precision, recall and F-score metrics. Resources: We have used several tools which are created in our lab including TTSCT (Text to SNOMED CT) service, Lexical Management System (LMS) and Ring-fencing approach. A set of gazetteers was created from the training data and employed in analysis as well. Results: Evaluation results produced a precision of 0.766, recall of 0.726 and F-score of 0.746 for named entity recognition based on 10-fold cross validation; and precision, recall and F-measure of 0.927 for relation extraction based on 5-fold cross validation on the training data. On the official test data on strict mode a precision of 0.686, recall of 0.539 and F-score of 0.604 was achieved. Based on the results our team was the 11 out of 25 participating teams. In the relaxed mode a precision of 0.912, recall of 0.701 and F-score of 0.793 was recorded and our team was the 12. A multi stage supervised machine learning method with mixed computational strategies seems to provide a reasonable strategy for automated extraction of disorders.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Clinical Information Extraction at the CLEF eHealth Evaluation lab 2016

This paper reports on Task 2 of the 2016 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs. The task continued with named entity recognition and normalization in French narratives, as offered in CLEF eHealth 2015. Named entity recognition involved ten types of entities including disorders that were defined according to Sem...

متن کامل

NCBI at 2013 ShARe/CLEF eHealth Shared Task: Disorder Normalization in Clinical Notes with Dnorm

We describe an application of DNorm – a mathematically principled and high performing methodology for disease recognition and normalization, even in the presence of term variation – to clinical notes. DNorm consists of a text processing pipeline, including the BANNER named entity recognizer to locate diseases in the text, and a novel machine learning approach based on pairwise learning to rank ...

متن کامل

CLEF eHealth Evaluation Lab 2015 Task 1b: Clinical Named Entity Recognition

This paper reports on Task 1b of the 2015 CLEF eHealth evaluation lab which extended the previous information extraction tasks of ShARe/CLEF eHealth evaluation labs by considering ten types of entities including disorders, that were to be extracted from biomedical text in French. The task consisted of two phases: entity recognition (phase 1), in which participants could supply plain or normaliz...

متن کامل

Overview of the ShARe/CLEF eHealth Evaluation Lab 2014

This paper reports on the 2nd ShARe/CLEFeHealth evaluation lab which continues our evaluation resource building activities for the medical domain. In this lab we focus on patients’ information needs as opposed to the more common campaign focus of the specialised information needs of physicians and other healthcare workers. The usage scenario of the lab is to ease patients and next-of-kins’ ease...

متن کامل

Multi-lingual ICD-10 Coding using a Hybrid rule-based and Supervised Classification Approach at CLEF eHealth 2017

In this paper we present our research efforts and obtained results within the CLEF eHealth challenge 2017, Track 1. The task involves the recognition and mapping of ICD-10 codes to English and French death certificates. Our approach proposes a two tier, two stage process. First, we use a rule-based system, based on handcrafted rules and the use of Apache Solr, to perform ICD-10 code Named Entit...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013